The Role of Second Trials in Cascades of Information over Networks 
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We study the propagation of information in social networks. To do so, we focus on a cascade 
model where nodes are infected with probability pi after their first contact with the information 
and with probability pi at all subsequent contacts. The diffusion starts from one random node and 
leads to a cascade of infection. It is shown that first and subsequent trials play different roles in 
the propagation and that the size of the cascade depends in a non-trivial way on p\, p2 and on the 
network structure. Second trials are shown to amplify the propagation in dense parts of the network 
while first trials are dominant for the exploration of new parts of the network and launching new 
seeds of infection. 

PACS numbers: 89.75.-k, 02.50.Le, 05.50.+q, 75.10.Hk 
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I. INTRODUCTION 



The propagation of information and new ideas has long 
been a fundamental question in the social sciences. Prop- 
agation may be driven by exogenous causes, when people 
are informed in a mean-field way by an external source, 
e.g. television, but also by endogenous mechanisms, 
when a few early adopters may influence their friends, 
who may in turn influence their own friends and possibly 
lead to a cascade of influence [l|. This self-organizing 
process, which reminds of the dynamics of an epidemic, 
is usually called the word-of-mouth phenomenon. It has 
attracted more and more attention in the last few years 
due to the emergence of the internet and of online social 
networks, which have led to more decentralized media 
of communication. A typical example is the blogosphcrc, 
where blogs are written and read by web users and where 
debates/discussions may take place among the bloggers. 
As of today, the blogosphcrc is extremely influential in 
the adoption or rejection of products but also in politics, 
as more and more citizens voice their opinions and mobi- 
lize community efforts around their candidates. From a 
practical point of view, the emergence of these participa- 
tive media has changed the way elections take place, by 
allowing politicians to reach new audiences, raise money, 
communicate to voters and even consider all of them as a 
gigantic think tank [2j , and also to open new ways to pro- 
mote commercial products via recommendation networks 
or viral marketing methods. It is therefore interesting to 
better understand how such information cascades take 
place in social networks 0, 0, H, S 0, Q ■ 

A good description of the word-of-mouth phenomenon 
requires two elements: a model of propagation and a 
network structure. The model of propagation defines 
the way information (e.g. a marketing campaign for a 
specific product, an information) flows between acquain- 
tances. One of the most common models of propagation 
is the Independent Cascade Model (ICM)[J, Q, where 



one starts from an initial set of infected nodes. When 
a new node becomes infected, it tries one single time to 
infect each of its neighbors with independent probabil- 
ity p. The process stops when no new node has been 
infected. The size of the information cascade is given 
by the number of infected nodes and one says that an 
epidemic outbreak (keeping in mind that the models de- 
scribed in this paper apply only to information diffusion, 
not to the epidemical spread of diseases) takes place when 
the fraction of people who are infected does not vanish as 
the network size increases. It is straightforward to show 
that ICM is equivalent to the epidemiological SIR model, 
where nodes are divided in three classes, i.e. susccpti- 
ble/infectious/removed @, and where infectious nodes 
infect their neighbors with rate p and are removed with 
rate 1. It is also possible to view ICM as a bond perco- 
lation problem, the final number of infected nodes being 
the sum of the sizes of the connected components the 
initial nodes belong to. Second, this viral process has to 
be applied on a realistic social network, where each node 
defines a member of the society and edges are drawn be- 
tween acquaintances. For a long time the design of these 
social networks was purely theoretical and real social net- 
works were generally limited in size, but the advent of 
the Internet and of cheap computer power now allows to 
study social networks composed of millions of individu- 
als and to characterize the statistical properties of their 
topology. For instance, it has been shown that social 
networks typically exhibit the small- world property [lj| , 
heavy-tailed degree distributions [ll[ , assortative mixing 
[T3 |. modular structure etc. An important challenge 
is therefore to understand how the topology of the social 
network affects the propagation of information but also 
to find statistical indicators for the most influential nodes 
in the network [1, EI EE El ■ 

The ICM is a direct implementation of an epidemio- 
logical model in a social context. There are, however, 
drastic differences between the propagation of a virus 
and the propagation of ideas. Indeed, recent experiments 
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have shown that the memory of the individuals may play 
a dominant role in the latter case. For instance, in the 
case of recommendation networks, the probability that 
people buy an item depends in a non-trivial way on the 
number of times they received a recommendation for this 
item [l?]]. In the case of online social networks, it was 
also shown that the probability to join a community de- 
pends on the number of your friends in that community 
[18j . In general, empirical studies show that the prob- 
ability of getting infected increases with the number of 
contacts k and saturates for large values of k. Several 
models have been introduced in order to take into ac- 
count this property, such as general ICM, threshold and 
cascade models 0, US HH El, HU or generalized voter 
models [U, [2f|. The way such dynamics is affected by 
the network topology is, however, still poorly understood 
[IH, even t houg h some studies focus on specific topolo- 
gies [13, HI |M H3|. The goal of this paper is to bridge 
this gap by focusing on a generalization of ICM which 
includes in the simplest way a dependence on the num- 
ber of contacts. The model is applied on small-world 
networks in order to highlight the importance of the net- 
work randomness. As a first step, we focus on simplified 
cases where the network is directed, which allows us to 
obtain an analytical description of the propagation. It 
is shown that the birth of large cascades of information 
is strongly influenced by the network topology and that 
first and subsequent trials play very different roles in the 
propagation. Computer simulations are also performed 
on directed and on more realistic undirected networks, 
and confirm the above observations. 
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and with a probability p 2 for all subsequent contacts. 
The dynamics stops when no new node is infected. The 
classical ICM is therefore recovered when p\ — p 2 . Since 
the ICM and SIR model are equivalent, one can also in- 
terpret the generalized ICM as an extension of the SIR 
model. The dependence in the number of contacts leads 
to a new class of nodes, namely contacted nodes, which 
have already been unsuccessfully attacked by infectious 
nodes. In that framework, the probability of a suscepti- 
ble node to be infected by a neighboring infectious node 
is pi while it becomes contacted with probability 1 — p\ . 
When a contacted node is attacked by an infectious node, 
its probability to become infected is p 2 . Finally, an infec- 
tious node becomes removed once it has attacked each of 
its neighbors. The model can also be related to threshold 
models [H, 0| where each node receives a random thresh- 
old generated following a given distribution. A node be- 
comes infected when the number of infected neighbors 
exceeds his threshold. The probability of having a thresh- 
old of value 1 is the probability of being infected at the 
first trial, in our case p\. In this way, one can generate 
for every couple (pi : p 2 ) the thresholds of the equivalent 
threshold model with the following expressions 

P{6 = 1) = Pl (1) 
P(8 = k) = (l- Pl )(l- P2 ) k ~ 2 P2 Vfc>2. (2) 

It is also interesting to note that our model may be re- 
lated to percolation. The case pi = p 2 is well-known to 
be equivalent to bond percolation but the case p 2 = 
can also be seen as a node percolation problem. Indeed, 
in that case, each neighbour of an infected node is in- 
fected with a probability p\ only if it is the first time it 
is in contact with the information. The total number of 
infected nodes may therefore be obtained by removing 
nodes from the network with a probability 1 — p\ and 
by looking at the size of the connected components. For 
general values of p\ and p 2 , however, the system is much 
more complicated and the probabilities of infection arc 
not straightforward to compute. 



FIG. 1: Illustration of the generalized ICM. Infected nodes 
contact their neighbours only once. These neighbours get 
infected with probability pi if it is the first time they are 
contacted (and therefore remain uninfected with probability 
1 — Pi) and p2 otherwise. The presence of triangles and, by 
extension of local structures, is crucial for second and subse- 
quent trials to be frequent. 



II. PROPERTIES OF THE MODEL 

Our generalization of ICM is defined as follows. The 
network is composed of N nodes and one node is initially 
infected. Each time a new node is infected, it contacts 
all of its neighbours, and they each get infected with a 
probability p\ if it is the first time they are contacted 



III. RANDOM NETWORKS 

In this paper, we are interested in the conditions for 
a large cascade to emerge. We therefore look for the 
critical couple (pi c ,f>2 c ) such that a random node infects 
a non vanishing fraction of the network for any couple 
(j>i,p 2 ) > (j>i a ,p 2a ) where the inequalities are componen- 
twise. This couple determines the epidemic threshold 
of this network. Let us first focus on a directed ran- 
dom Erdos-Renyi network, composed of N nodes and 
where the probability to have a link between two ran- 
domly selected nodes is p er . As we will show, the propor- 
tion of second attacks vanishes when N tends to infinity 
when one is below the epidemic threshold. Therefore the 
threshold in such topology hardly depends on p 2 and we 
recover the same threshold as for the ICM model. Even 
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though this result was predictable, the probability p2 still 
plays a role when the size of the network is finite. Let 
S(t), C(t), I(t) and R(t) be the number of susceptible, 
contacted, infectious and removed nodes respectively at 
time t. By using a mean-field approximation, one obtains 
the number of links between different types of nodes. For 
instance, the number of links going from infectious nodes 
to susceptible nodes is S(t)I(t)p er , which also represents 
the number of attacks at time t from infectious nodes 
on susceptible nodes. The average number of susceptible 
nodes that become infected at time t is therefore given 
by S(t)I(t)p er pi. Similar calculations lead to the set of 
equations 

s = —sid 

c = sid(l -pi) - cidp 2 ,„s 
i = —i + sidpi+cidp2 
r = i 

for the densities s, c, i and r, where s=j^,c=S,i = -£, 
r = M , and where d = N p er is the average degree of the 
network. 

The epidemic threshold is found by linearizing this 
nonlinear dynamical system around the stationary so- 
lution x = (1,0,0,0) where all nodes are susceptible, 
and looking at the eigenvalues of the linearized matrix. 
The behaviour of the system is then essentially governed 
by the linearized equation i = i(— 1 + dpi), which im- 
plies that Xq is stable if dpi < 1 and therefore that the 
infection will not reach a non vanishing fraction of the 
network in that case. This result, which is well known 
in percolation theory when p2 = p\ , also shows that the 
epidemic threshold does not depend on the parameter 
P2- This may be understood by noting that second and 
subsequent trials are statistically relevant only when a 
finite fraction of nodes have been infected, which implies 
that the epidemic threshold may be evaluated without 
taking them into account. This also implies that for a 
non- vanishing initial fraction of contacted nodes, we then 
have a dependency on p2 and the threshold will change 
accordingly. Above the epidemic threshold, the system 
of equations ceases to be valid because it does not 
incorporate multiple attacks (i.e. several edges attacking 
a node at the same time), thereby leading to an ovcrcsti- 
mation of the number of infections. In that case, we have 
therefore performed computer simulations of the model 
which show that the total fraction of nodes r(oo) hav- 
ing been infected increases with p2, as expected. This 
becomes even more obvious for N decreasing. Finally, 
when N is relatively small, the proportion of second at- 
tacks is no more negligible and the threshold varies with 
pi and p 2 . 



IV. DIRECTED SMALL- WORLD NETWORK 

In order to highlight the role played by the network 
topology, we have applied the model on a directed version 



of the well-known Watts-Strogatz model for small-world 
networks The main reason for looking at this di- 

rected version rests in the equations of propagation that 
becomes tractable. However the simulations show that 
both cases, directed and undirected, exhibit similar cou- 
ple of thresholds. The directed version is built from a 
directed one-dimensional lattice of N sites, with periodic 
boundary conditions, i.e., a ring, each vertex k pointing 
to 2 neighbors k + 1, k + 2, see Fig. [2j With probability 
0, these "regular" links are removed and replaced by ran- 
dom links. This network therefore exhibits an interplay 
between order and randomness. By increasing the pa- 
rameter 0, one increases the randomness of the topology 
and one recovers a random network when (f> = 1. 




= = 0.2 = 1 



FIG. 2: For different values of 0, the topology is a regular 
lattice (0 = 0), a small world network (0 — .2) or a random 
network (0 = 1). 

It is instructive to first consider the case of a regular 
lattice, i.e., = 0. In that case, the information prop- 
agates in the system in an ordered way and the state of 
each site k is only influenced by the sites k — 2 and k— 1. 
For this reason one does not need to store separately the 
state "contacted" anymore. Let riij- t k, with i,j G {0,1} 
be the probability that node k is i and node k + 1 is j, 
with the correspondence: 1 = infectious, = not infec- 
tious. By definition, ^\ ■ = 1 for any k. Let us 
assume that one starts the propagation at node k = 1, so 
that noi ; o = 1. Then, it is straightforward to show that 
the quantities mj-k = 1 satisfy the recurrence 



"ii ; fc+i = (pi + (1 -Pi)P2)nu-k +Pin 0X .k 
"oi ; fc+i = P\n\a-k 

n w -k+i = (1 - Pi - (1 - Pi)P2)nu-k + (1 - Pi)n ,i 
(4) 

while the probability that the dynamics ends grows 
monotonically like 

«oo ; fc+i = n 00 -k + (1 -pi)n w . k . (5) 

This corresponds to the 4 states Markov chain repre- 
sented in Fig. [3] By definition, the expected number 
of infected nodes is Noo = \ ^ ■ k mj-k+i- 

The asymptotic number of infected nodes grows like 
the largest eigenvalue of the matrix associated with the 
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FIG. 3: On a regular lattice, the states of the nodes k and 
k + 1, denoted by i and j respectively, fully determine the 
state of node k + 2. The dynamics is therefore specified by 
the succession of states The dynamics ends when two 

successive zeros, i.e. a state (0,0), take place. 



linear system ((4]) 



Let us now focus on a topology where a fraction of 
the links is displaced in a random way. In order to gen- 
eralize the results of the previous section, it is useful to 
label each node with its position k on the underlying one- 
dimensional lattice. By construction, each node k points 
to k + 1 and k + 2 when = but such links only exist 
with probability 1 — <f> in general. In a system where <j> is 
sufficiently small and where only a vanishing fraction of 
the nodes gets activated, one may decouple the dynam- 
ics as follows 
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The initial seed may infect a segment 
of nodes which are contiguous on the underlying lattice, 
thereby leading to Ni(j>i,p 2 ) contiguous infected nodes. 
This number may be evaluated by generalizing the set of 
equations Q and taking into account the fact that some 
links are missing. The associated matrix with this linear 
system is 



A = 



Pl + (l-pi)p2 Pi 

Pi 

(i- Pl )(i-p 2 ) (1-pO o 



(G) 



This largest eigenvalue is smaller than 1 for any p±, p 2 , 
except when p\ = 1 or p2 = 1, which implies that an 
epidemic outbreak takes place only in these trivial cases. 
In contrast, when p\ and p 2 are different from 1, only a 
finite number of nodes gets asymptotically infected. This 
is due to the one-dimensionality of the topology, which 
implies that two nodes at most may spread the infection 
at each step and that the probability that no new node 
gets infected is different of zero when pi ^ 1 and p 2 ^ 1 . 
As expected, increasing values of p\ or p 2 increase the 
total number of infected nodes. The analytical expression 
for TVoo when p\ , p 2 ^ is given by 
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(8) 



where we consider three cases: no missing link, this oc- 
curs with probability (1 — <fi) 2 , and then we recover the 
matrix A in Eq. [6l secondly we have with probability 
20(1 — (/>) one missing link and the corresponding transi- 
tion matrix, and finally we have with probability (j> 2 no 
link accompanied by a simple transition matrix. By using 
similar arguments that for the regular lattice, one finds 
that the average number of contiguously infected nodes 
is 



Ni(pi,p 2 ) 



(1-Pi(l 



1 - 



;i - pi)p 2 (i 
-p 2 (i-0) 2 



r 



- Pl (i-0)(i-p 2 (i-0) + 0))' 



(9) 



This segment of Ni(pi,p 2 ) infected nodes may in turn 
infect 2<ppiNi(pi,p 2 ) distant nodes which will play the 
role of a new seed, each of them infecting a new seg- 
ment of average size Ni(pi,p 2 ), etc. Below the epidemic 
threshold, only a vanishing proportion of nodes is in- 



fected and one may assume that the different segments 
do not overlap. The total number of infected links is 
therefore 



oo 

Noo - N 1 (p 1 ,p 2 )J2(^PiNi(pi,p 2 )) 

i=0 



(10) 
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which converges to 

A^oo = JVi(pi,pa)/(l - 2<j ) p 1 N 1 {p 1 ,p 2 )) (11) 

if 

20 Pl N 1 (p 1 , P2 ) < 1. (12) 
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FIG. 4: Couples of thresholds for (f> = [0.3 0.1 0.01 0.001]. 
In the limit cj> — » 1 of a random network, the critical line 
is vertical, pi = 1/2, and the epidemic threshold is therefore 
independent on p2- The different signs represent experimental 
couples of threshold obtained with a precision of 0.01. The 
slight shifts with respect to the theoretical curves come from 
the finite size of the network (10 4 nodes), this effect increases 
for <f> close to 0. 
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FIG. 5: Total fraction of infected nodes in log scale as a func- 
tion of pi, for p2 = and p2 = 0.8 respectively. The network 
is composed of 10 4 nodes and <f> = 0.1. Vertical lines corre- 
spond to the theoretical prediction pi c where cascades occur. 

The line 2<ppiNi{pi 1 p 2 ) = 1 therefore separates two 
regimes, one in which the spreading dies out and another 



one in which an infinite number of nodes is asymptoti- 
cally infected. By using Eq.© and solving Eg. ([12"]) . one 
finds an analytical formula for the critical value 



P-lc 



1 - (2 + 



'-) Pl + (1 - cp + (j? - <\?)p\ 



(l-$ a (l-pi)(l-pi-pi< 



. (13) 



such that an epidemics takes place when p 2 > p 2c (see 
Fig. 2]) . It is interesting to note that the epidemic thresh- 
old depends both on p\ and p 2 for general values of <fr, 
but that these parameters are associated with different 
mechanisms. The probability p 2 plays an important role 
in the local propagation of the infection among neigh- 
bouring sites. The probability p\ also plays a role for 
such propagations but it is also responsible for the in- 
fection of new distant seeds, a process that is crucial for 
exploring several disconnected parts of the network and 
that favours the emergence of an epidemic. One observes 
from @ and (fT2"|) that p 2 is less and less important as <j> 
increases. In the limit <f> — > 1 of a random network, the 
length of infected segments Ni(pi,p 2 ) goes to 1, which 
implies that the epidemic threshold is p\ = 1/2, indepen- 
dently of p 2 , as predicted in our analysis of the Erdos- 
Renyi network. It is also interesting to note that the 
total number of infected nodes (11) may decrease when 
<j) is increased, which is in contradiction with the usual 
belief that short-cuts promote the propagation [2^, [3(| ■ 

We have checked the validity of (fT2")) by performing 
computer simulations of the generalized ICM on a di- 
rected small- world network with N = 10 4 nodes and by 
averaging the results over 10 4 realizations of the dynam- 
ics. As shown in Fig. ([5]), the critical threshold for a 
given p 2 is evaluated by looking at the probability pi for 
which the slope of is maximal when the Y-axis is in 
log-scale. In Fig. (TJ| these critical points are drawn for 
4> = 0.3, 4> = 0.1 and = 0.01. The case cj) = 0.001 is not 
shown because of the very small number of short-cuts in 
that case and therefore of the very large fluctuations from 
one realization of the network to another one. The simu- 
lation results show large fluctuations but are nonetheless 
in good agreement with the theoretical predictions. 

Finally, we have also studied numerically our model 
when it is applied to an undirected small-world network 
made of 10 4 nodes and with an average degree 4. As ex- 
pected (the mean degree is twice larger), the frontiers are 
shifted to the left meaning that smaller probabilities are 
sufficient to observe significant cascades in the network 
(see Fig. [6]). Qualitatively, however, the system behaves 
in the same way as in the directed case and the lines 
determining the epidemic threshold have similar shapes. 
Theoretically, when cf> = 1, the network is random and 
the epidemic threshold should not depend on p 2 , i.e. it 
is a vertical line. However, the finite size of the network 
implies that the proportion of triangles does not vanish 
and therefore that second attacks may occur due to finite 
size effects. Consequently the experiments show a slight 
dependency on p 2 and the frontier is not exactly vertical 
when 0=1. However we recover the threshold of the 
ICM model when p\ = p 2 = 0.25. 
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V. CONCLUSION 

In this paper, we have focused on a very simple model 
for the cascade of information in social networks. The 
novelty of the model consists in considering different 
probabilities for being infected depending on the number 
of contacts with the information. The model has been ap- 
plied on a directed small-world network in order to show 
how the randomness of the network topology affects the 
propagation. It is shown that first and subsequent tri- 
als play very different roles: first trials are primordial in 
order to discover unexplored parts of the network and 
launch new seeds of infection, while second and subse- 
quent trials influence the propagation in ordered parts 
of the network, where triangles (and other dense motifs) 
are frequent. The epidemic threshold, which determines 
the success of the cascade, depends in a non-trivial way 



on these two mechanisms and on the randomness of the 
network topology, but it is dominated by the success of 
first trials. 

The importance of first trials should be put in perspec- 
tive with Granovetter's famous work on "The Strength 
of Weak Ties" [HI, [33| , which states that weak links keep 
the network connected whereas strong links are mostly 
concentrated within communities. In the context of in- 
formation diffusion, our model shows that the first trials 
play a similar cohesive role by connecting different com- 
munities, while second and subsequent trials accelerate 
the propagation inside the communities. This is due to 
the fact that dense parts in the network make possible 
the existence of several infected paths to each node, and 
therefore increase the number of time one node is con- 
tacted. In the extreme scenario of a k clique, for instance, 
where k nodes are fully connected, after the first step, all 
further steps will be considered as second trials. 

To conclude, our model is motivated by recent exper- 
iments which have shown that an accumulation of con- 
tacts favours the propagation of information and that, in 
particular, second and subsequent trials are more success- 
ful than first trials. Interestingly, our model also repro- 
duces the fact that locally dense subnetworks accelerate 
the propagation [34], [35| , a property which has been ob- 
served for the adoption of new services among users of a 
mobile phone networks [36j and which is not reproduced 
by the original ICM. 
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